{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/master/tutorials/W1D3_ModelFitting/W1D3_Tutorial1.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "# Neuromatch Academy: Week 1, Day 3, Tutorial 1\n",
    "# Model Fitting: Linear regression with MSE\n",
    "\n",
    "**Content creators**: Pierre-Étienne Fiquet, Anqi Wu, Alex Hyafil with help from Byron Galbraith\n",
    "\n",
    "**Content reviewers**: Lina Teichmann, Saeed Salehi, Patrick Mineault,  Ella Batty, Michael Waskom\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "___\n",
    "#Tutorial Objectives\n",
    "\n",
    "This is Tutorial 1 of a series on fitting models to data. We start with simple linear regression, using least squares optimization (Tutorial 1) and Maximum Likelihood Estimation (Tutorial 2). We will use bootstrapping to build confidence intervals around the inferred linear model parameters (Tutorial 3). We'll finish our exploration of regression models by generalizing to multiple linear regression and polynomial regression (Tutorial 4). We end by learning how to choose between these various models. We discuss the bias-variance trade-off (Tutorial 5) and Cross Validation for model selection (Tutorial 6).\n",
    "\n",
    "In this tutorial, we will learn how to fit simple linear models to data.\n",
    "- Learn how to calculate the mean-squared error (MSE) \n",
    "- Explore how model parameters (slope) influence the MSE\n",
    "- Learn how to find the optimal model parameter using least-squares optimization\n",
    "\n",
    "---\n",
    "\n",
    "**acknowledgements:** \n",
    "- we thank Eero Simoncelli, much of today's tutorials are inspired by exercises asigned in his mathtools class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Figure Settings\n",
    "import ipywidgets as widgets       # interactive display\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "#@title Helper functions\n",
    "\n",
    "def plot_observed_vs_predicted(x, y, y_hat, theta_hat):\n",
    "  \"\"\" Plot observed vs predicted data\n",
    "\n",
    "  Args:\n",
    "      x (ndarray): observed x values\n",
    "  y (ndarray): observed y values\n",
    "  y_hat (ndarray): predicted y values\n",
    "\n",
    "  \"\"\"\n",
    "  fig, ax = plt.subplots()\n",
    "  ax.scatter(x, y, label='Observed')  # our data scatter plot\n",
    "  ax.plot(x, y_hat, color='r', label='Fit')  # our estimated model\n",
    "  # plot residuals\n",
    "  ymin = np.minimum(y, y_hat)\n",
    "  ymax = np.maximum(y, y_hat)\n",
    "  ax.vlines(x, ymin, ymax, 'g', alpha=0.5, label='Residuals')\n",
    "  ax.set(\n",
    "      title=fr\"$\\hat{{\\theta}}$ = {theta_hat:0.2f}, MSE = {mse(x, y, theta_hat):.2f}\",\n",
    "      xlabel='x',\n",
    "      ylabel='y'\n",
    "  )\n",
    "  ax.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Section 1: Mean Squared Error (MSE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 517
    },
    "colab_type": "code",
    "outputId": "51b28d5d-deb1-4d3f-8539-a2aa1e0e0916"
   },
   "outputs": [],
   "source": [
    "#@title Video 1: Linear Regression & Mean Squared Error\n",
    "from IPython.display import YouTubeVideo\n",
    "video = YouTubeVideo(id=\"HumajfjJ37E\", width=854, height=480, fs=1)\n",
    "print(\"Video available at https://youtube.com/watch?v=\" + video.id)\n",
    "video"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "**Linear least squares regression** is an old but gold  optimization procedure that we are going to use for data fitting. Least squares (LS) optimization problems are those in which the objective function is a quadratic function of the\n",
    "parameter(s) being optimized.\n",
    "\n",
    "Suppose you have a set of measurements, $y_{n}$ (the \"dependent\" variable) obtained for different input values, $x_{n}$ (the \"independent\" or \"explanatory\" variable). Suppose we believe the measurements are proportional to the input values, but are corrupted by some (random) measurement errors, $\\epsilon_{n}$, that is:\n",
    "\n",
    "$$y_{n}= \\theta x_{n}+\\epsilon_{n}$$\n",
    "\n",
    "for some unknown slope parameter $\\theta.$ The least squares regression problem uses **mean squared error (MSE)** as its objective function, it aims to find the value of the parameter $\\theta$ by minimizing the average of squared errors:\n",
    "\n",
    "\\begin{align}\n",
    "\\min _{\\theta} \\frac{1}{N}\\sum_{n=1}^{N}\\left(y_{n}-\\theta x_{n}\\right)^{2}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "We will now explore how MSE is used in fitting a linear regression model to data. For illustrative purposes, we will create a simple synthetic dataset where we know the true underlying model. This will allow us to see how our estimation efforts compare in uncovering the real model (though in practice we rarely have this luxury).\n",
    "\n",
    "First we will generate some noisy samples $x$ from [0, 10) along the line $y = 1.2x$ as our dataset we wish to fit a model to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "fe5008ca-5d2a-44be-83a3-655fe57b7d1a"
   },
   "outputs": [],
   "source": [
    "# @title\n",
    "\n",
    "# @markdown Execute this cell to generate some simulated data\n",
    "\n",
    "# setting a fixed seed to our random number generator ensures we will always\n",
    "# get the same psuedorandom number sequence\n",
    "np.random.seed(121)\n",
    "\n",
    "# Let's set some parameters\n",
    "theta = 1.2\n",
    "n_samples = 30\n",
    "\n",
    "# Draw x and then calculate y\n",
    "x = 10 * np.random.rand(n_samples)  # sample from a uniform distribution over [0,10)\n",
    "noise = np.random.randn(n_samples)  # sample from a standard normal distribution\n",
    "y = theta * x + noise\n",
    "\n",
    "# Plot the results\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(x, y)  # produces a scatter plot\n",
    "ax.set(xlabel='x', ylabel='y');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "Now that we have our suitably noisy dataset, we can start trying to estimate the underlying model that produced it. We use MSE to evaluate how successful a particular slope estimate $\\hat{\\theta}$ is for explaining the data, with the closer to 0 the MSE is, the better our estimate fits the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Exercise 1: Compute MSE\n",
    "\n",
    "In this exercise you will implement a method to compute the mean squared error for a set of inputs $x$, measurements $y$, and slope estimate $\\hat{\\theta}$. We will then compute and print the mean squared error for 3 different choices of theta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def mse(x, y, theta_hat):\n",
    "  \"\"\"Compute the mean squared error\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): An array of shape (samples,) that contains the input values.\n",
    "    y (ndarray): An array of shape (samples,) that contains the corresponding\n",
    "      measurement values to the inputs.\n",
    "    theta_hat (float): An estimate of the slope parameter\n",
    "\n",
    "  Returns:\n",
    "    float: The mean squared error of the data with the estimated parameter.\n",
    "  \"\"\"\n",
    "  ####################################################\n",
    "  ## TODO for students: compute the mean squared error\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: compute the mean squared error\")\n",
    "  ####################################################\n",
    "\n",
    "  # Compute the estimated y\n",
    "  y_hat = ...\n",
    "\n",
    "  # Compute mean squared error\n",
    "  mse = ...\n",
    "\n",
    "  return mse\n",
    "\n",
    "\n",
    "# Uncomment below to test your function\n",
    "theta_hats = [0.75, 1.0, 1.5]\n",
    "# for theta_hat in theta_hats:\n",
    "#   print(f\"theta_hat of {theta_hat} has an MSE of {mse(x, y, theta_hat):.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "colab_type": "code",
    "outputId": "3ed61eef-3d5e-4118-b0b8-dc2a299c461a"
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "def mse(x, y, theta_hat):\n",
    "  \"\"\"Compute the mean squared error\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): An array of shape (samples,) that contains the input values.\n",
    "    y (ndarray): An array of shape (samples,) that contains the corresponding\n",
    "      measurement values to the inputs.\n",
    "    theta_hat (float): An estimate of the slope parameter\n",
    "\n",
    "  Returns:\n",
    "    float: The mean squared error of the data with the estimated parameter.\n",
    "  \"\"\"\n",
    "\n",
    "  # Compute the estimated y\n",
    "  y_hat = theta_hat * x\n",
    "\n",
    "  # Compute mean squared error\n",
    "  mse = np.mean((y - y_hat)**2)\n",
    "\n",
    "  return mse\n",
    "\n",
    "\n",
    "theta_hats = [0.75, 1.0, 1.5]\n",
    "for theta_hat in theta_hats:\n",
    "  print(f\"theta_hat of {theta_hat} has an MSE of {mse(x, y, theta_hat):.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "The result should be:\n",
    "\n",
    "theta_hat of 0.75 has an MSE of 9.08\\\n",
    "theta_hat of 1.0 has an MSE of 3.0\\\n",
    "theta_hat of 1.5 has an MSE of 4.52\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "We see that $\\hat{\\theta} = 1.0$ is our best estimate from the three we tried. Looking just at the raw numbers, however, isn't always satisfying, so let's visualize what our estimated model looks like over the data. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 303
    },
    "colab_type": "code",
    "outputId": "dd2f0648-e340-46b4-a977-c3b2c44aa5ad"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "\n",
    "#@markdown Execute this cell to visualize estimated models\n",
    "\n",
    "fig, axes = plt.subplots(ncols=3, figsize=(18, 4))\n",
    "for theta_hat, ax in zip(theta_hats, axes):\n",
    "\n",
    "  # True data\n",
    "  ax.scatter(x, y, label='Observed')  # our data scatter plot\n",
    "\n",
    "  # Compute and plot predictions\n",
    "  y_hat = theta_hat * x\n",
    "  ax.plot(x, y_hat, color='r', label='Fit')  # our estimated model\n",
    "\n",
    "  ax.set(\n",
    "      title= fr'$\\hat{{\\theta}}$= {theta_hat}, MSE = {mse(x, y, theta_hat):.2f}',\n",
    "      xlabel='x',\n",
    "      ylabel='y'\n",
    "  );\n",
    "\n",
    "axes[0].legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Interactive Demo: MSE Explorer\n",
    "\n",
    "Using an interactive widget, we can easily see how changing our slope estimate changes our model fit. We display the **residuals**, the differences between observed and predicted data, as line segments between the data point (observed response) and the corresponding predicted response on the model fit line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 462,
     "referenced_widgets": [
      "1ed7e01f92e64ad8a771f103472c0c70",
      "1de81a61d9cf4c14afedca745ab38e23",
      "1b51daf26cb944ce984e5f370321c100",
      "70b1ce1a72d14c2bbc18432013d41fdf",
      "15863e80fcbd4395875a08fa157db4d6",
      "a7e70751d29045c2ba64b9ea59f17283",
      "06095e58574a4927a959300da45067fd"
     ]
    },
    "colab_type": "code",
    "outputId": "05cee131-a8a2-41b2-fa9d-57db055756a2"
   },
   "outputs": [],
   "source": [
    "#@title\n",
    "\n",
    "#@markdown Make sure you execute this cell to enable the widget!\n",
    "\n",
    "@widgets.interact(theta_hat=widgets.FloatSlider(1.0, min=0.0, max=2.0))\n",
    "def plot_data_estimate(theta_hat):\n",
    "  y_hat = theta_hat * x\n",
    "  plot_observed_vs_predicted(x, y, y_hat, theta_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "While visually exploring several estimates can be instructive, it's not the most efficient for finding the best estimate to fit our data. Another technique we can use is choose a reasonable range of parameter values and compute the MSE at several values in that interval. This allows us to plot the error against the parameter value (this is also called an **error landscape**, especially when we deal with more than one parameter). We can select the final $\\hat{\\theta}$ ($\\hat{\\theta}_{MSE}$) as the one which results in the lowest error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 430
    },
    "colab_type": "code",
    "outputId": "8d4999bf-1ad9-4f06-bf3b-d61fbaaab837"
   },
   "outputs": [],
   "source": [
    "# @title\n",
    "\n",
    "# @markdown Execute this cell to loop over theta_hats, compute MSE, and plot results\n",
    "\n",
    "# Loop over different thetas, compute MSE for each\n",
    "theta_hat_grid = np.linspace(-2.0, 4.0)\n",
    "errors = np.zeros(len(theta_hat_grid))\n",
    "for i, theta_hat in enumerate(theta_hat_grid):\n",
    "  errors[i] = mse(x, y, theta_hat)\n",
    "\n",
    "# Find theta that results in lowest error\n",
    "best_error = np.min(errors)\n",
    "theta_hat = theta_hat_grid[np.argmin(errors)]\n",
    "\n",
    "\n",
    "# Plot results\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(theta_hat_grid, errors, '-o', label='MSE', c='C1')\n",
    "ax.axvline(theta, color='g', ls='--', label=r\"$\\theta_{True}$\")\n",
    "ax.axvline(theta_hat, color='r', ls='-', label=r\"$\\hat{{\\theta}}_{MSE}$\")\n",
    "ax.set(\n",
    "  title=fr\"Best fit: $\\hat{{\\theta}}$ = {theta_hat:.2f}, MSE = {best_error:.2f}\",\n",
    "  xlabel=r\"$\\hat{{\\theta}}$\",\n",
    "  ylabel='MSE')\n",
    "ax.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "We can see that our best fit is $\\hat{\\theta}=1.18$ with an MSE of 1.45. This is quite close to the original true value $\\theta=1.2$!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Section 2: Least-squares optimization\n",
    "\n",
    "\n",
    "While the approach detailed above (computing MSE at various values of $\\hat\\theta$) quickly got us to a good estimate, it still relied on evaluating the MSE value across a grid of hand-specified values. If we didn't pick a good range to begin with, or with enough granularity, we might miss the best possible estimator. Let's go one step further, and instead of finding the minimum MSE from a set of candidate estimates, let's solve for it analytically.\n",
    "\n",
    "We can do this by minimizing the cost function. Mean squared error is a convex objective function, therefore we can compute its minimum using calculus. Please see video or appendix for this derivation! After computing the minimum, we find that:\n",
    "\n",
    "\\begin{align}\n",
    "\\hat\\theta = \\frac{\\vec{x}^\\top \\vec{y}}{\\vec{x}^\\top \\vec{x}}\n",
    "\\end{align}\n",
    "\n",
    "This is known as solving the normal equations. For different ways of obtaining the solution, see the notes on [Least Squares Optimization](https://www.cns.nyu.edu/~eero/NOTES/leastSquares.pdf) by Eero Simoncelli."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "### Exercise 2: Solve for the Optimal Estimator\n",
    "\n",
    "In this exercise, you will write a function that finds the optimal $\\hat{\\theta}$ value using the least squares optimization approach (the equation above) to solve MSE minimization. It shoud take arguments $x$ and $y$ and return the solution $\\hat{\\theta}$.\n",
    "\n",
    "We will then use your function to compute $\\hat{\\theta}$ and plot the resulting prediction on top of the data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code"
   },
   "outputs": [],
   "source": [
    "def solve_normal_eqn(x, y):\n",
    "  \"\"\"Solve the normal equations to produce the value of theta_hat that minimizes\n",
    "    MSE.\n",
    "\n",
    "    Args:\n",
    "    x (ndarray): An array of shape (samples,) that contains the input values.\n",
    "    y (ndarray): An array of shape (samples,) that contains the corresponding\n",
    "      measurement values to the inputs.\n",
    "\n",
    "  Returns:\n",
    "    float: the value for theta_hat arrived from minimizing MSE\n",
    "  \"\"\"\n",
    "\n",
    "  ################################################################################\n",
    "  ## TODO for students: solve for the best parameter using least squares\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: solve for theta_hat using least squares\")\n",
    "  ################################################################################\n",
    "\n",
    "  # Compute theta_hat analytically\n",
    "  theta_hat = ...\n",
    "\n",
    "  return theta_hat\n",
    "\n",
    "\n",
    "# Uncomment below to test your function\n",
    "# theta_hat = solve_normal_eqn(x, y)\n",
    "# y_hat = theta_hat * x\n",
    "# plot_observed_vs_predicted(x, y, y_hat, theta_hat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 465
    },
    "colab_type": "code",
    "outputId": "5a46f798-31e1-4fbd-da7a-de8a270c5f7d"
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "def solve_normal_eqn(x, y):\n",
    "  \"\"\"Solve the normal equations to produce the value of theta_hat that minimizes\n",
    "    MSE.\n",
    "\n",
    "    Args:\n",
    "    x (ndarray): An array of shape (samples,) that contains the input values.\n",
    "    y (ndarray): An array of shape (samples,) that contains the corresponding\n",
    "      measurement values to the inputs.\n",
    "\n",
    "  Returns:\n",
    "    float: the value for theta_hat arrived from minimizing MSE\n",
    "  \"\"\"\n",
    "\n",
    "  # Compute theta_hat analytically\n",
    "  theta_hat = (x.T @ y) / (x.T @ x)\n",
    "\n",
    "  return theta_hat\n",
    "\n",
    "\n",
    "theta_hat = solve_normal_eqn(x, y)\n",
    "y_hat = theta_hat * x\n",
    "\n",
    "with plt.xkcd():\n",
    "  plot_observed_vs_predicted(x, y, y_hat, theta_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "We see that the analytic solution produces an even better result than our grid search from before, producing $\\hat{\\theta} = 1.21$ with MSE = 1.43!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "- Linear least squares regression is an optimization procedure that can be used for data fitting:\n",
    "    - Task: predict a value for $y$ given $x$\n",
    "    - Performance measure: $\\textrm{MSE}$\n",
    "    - Procedure: minimize $\\textrm{MSE}$ by solving the normal equations\n",
    "- **Key point**: We fit the model by defining an *objective function* and minimizing it. \n",
    "- **Note**: In this case, there is an *analytical* solution to the minimization problem and in practice, this solution can be computed using *linear algebra*. This is *extremely* powerful and forms the basis for much of numerical computation throughout the sciences."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "---\n",
    "# Appendix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text"
   },
   "source": [
    "## Least Squares Optimization Derivation\n",
    "\n",
    "We will outline here the derivation of the least squares solution.\n",
    "\n",
    "We first set the derivative of the error expression with respect to $\\theta$ equal to zero, \n",
    "\n",
    "\\begin{align}\n",
    "\\frac{d}{d\\theta}\\frac{1}{N}\\sum_{i=1}^N(y_i - \\theta x_i)^2 = 0 \\\\\n",
    "\\frac{1}{N}\\sum_{i=1}^N-2x_i(y_i - \\theta x_i) = 0\n",
    "\\end{align}\n",
    "\n",
    "where we used the chain rule. Now solving for $\\theta$, we obtain an optimal value of:\n",
    "\n",
    "\\begin{align}\n",
    "\\hat\\theta = \\frac{\\sum_{i=1}^N x_i y_i}{\\sum_{i=1}^N x_i^2}\n",
    "\\end{align}\n",
    "\n",
    "Which we can write in vector notation as:\n",
    "\n",
    "\\begin{align}\n",
    "\\hat\\theta = \\frac{\\vec{x}^\\top \\vec{y}}{\\vec{x}^\\top \\vec{x}}\n",
    "\\end{align}\n",
    "\n",
    "\n",
    "This is known as solving the *normal equations*. For different ways of obtaining the solution, see the notes on [Least Squares Optimization](https://www.cns.nyu.edu/~eero/NOTES/leastSquares.pdf) by Eero Simoncelli."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W1D3_Tutorial1",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
